- 01. Introduction
- 02. Motivation
- 03. Bias and Variance
- 04. Two Ways for Estimating Expected Returns
- 05. Baselines and Critics
- 06. Policy-based, Value-Based, and Actor-Critic
- 07. A Basic Actor-Critic Agent
- 08. A3C: Asynchronous Advantage Actor-Critic, N-step
- 09. A3C: Asynchronous Advantage Actor-Critic, Parallel Training
- 10. A3C: Asynchronous Advantage Actor-Critic, Off- vs On-policy
- 11. A2C: Advantage Actor-Critic
- 12. A2C Code Walk-through
- 13. GAE: Generalized Advantage Estimation
- 14. DDPG: Deep Deterministic Policy Gradient, Continuous Actions
- 15. DDPG: Deep Deterministic Policy Gradient, Soft Updates
- 16. DDPG Code Walk-through
- 17. Summary